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ABSTRACT 

Generalized  Blomqvist  Correlation,  a  generalization  of 
the  double  median  test,  is  first  formulated  as  a  new  U  sta¬ 
tistic  with  a  lower  variance.  Several  open  questions  are 
answered,  and  some  examples  are  given. 
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1 .  Introduction 


Generalized  Blomqvist  Correlation  CGBC) ,  originally  pre¬ 
sented  by  the  author  [1]  ,  is  a  nonpar axnetric  test  for  inde¬ 
pendence  of  two  random  variables,  say  X  and  Y.  The  xy  plane 

2 

is  divided  into  n  regions  by  n-1  order  statistics  of  each 
of  the  X  and  Y  samples.  This  is  a  generalization  of  the 
double  median  test  since  we  allow  the  number  of  partitions 
to  increase  by  using  the  additional  information  of  order  sta¬ 
tistics  other  than  the  median. 

We  will  first  present  a  brief  review  of  the  double  median 
test  and  the  generalization  for  the  population.  Secondly,  we 
present  the  new  U  statistic  to  calculate  the  sample  correlation 
coefficient  and  present  its  mean  and  variance,  along  with  a 
discussion  of  some  asymptotic  properties.  Finally,  several 
examples  are  presented  comparing  GBC  with  Kendall's  t  and 
Spearman’s  pg. 

The  problem  which  led  to  the  development  of  GBC  was  that 
of  determining  the  correlation  between  a  sample  of  points  in  a 
digital  image  and  a  template  in  an  attempt  to  locate  edges 
in  the  original  image.  We  desire  to  know  whether  or  not  the 
additional  information  provided  by  added  order  statistics  will 
give  us  a  more  statistically  significant  estimate  of  the  correla¬ 


tion. 


2.  Medial  axis  correlation  and  the  generalization 


We  define  as  a  measure  of  correlation  the  difference 


in  probabilities 


♦  38  "s^d 


where 

irg  m  Prob{  (x>Xq)  and  (y>yQ)  or  (x<xQ)  and  (y<yQ)  > 

rrd  *  Prob{  tx>xQ)  and  (y<yQ)  or  tx<xQ)  and  (y>yQ)  > 

The  probability  ir  is  the  probability  that  the  deviations  of 

s 

x  from  the  chosen  xQ  and  y  from  yQ  have  the  same  sign.  The 

probability  ir^  is  the  probability  that  the  deviations  have 

different  signs.  If  we  let  xA=x„  and  yft=ym  where  x„  is  the 

0  m  0  m  m 

median  of  x  and  y„  is  the  median  of  y,  we  have  medial  axis 

m 

correlation. 

The  sample  analog  is  constructed  by  dividing  the  xy  plane 
into  four  regions  by  the  lines  x*^  and  y®ym*  The  sample  corre¬ 
lation  coefficient  q’  (after  Blomqvist  [2])  is  given  by 


q'  = 


nl~n2 

»l+n2 


where 


n^  *  the  number  of  samples  (x^,y^)  such  that  x^<xm  and 
yi  <ym  or  xi>xm  and  yi>ym 

n-,  =  the  number  of  samples  (x.  ,y-)  such  that  x.<x_  and 
c  i  i  i  m 

yi>ym  or  xi>xm  and  yi<ym 


Clearly,  n^+n2*N,  the  number  of  samples.  Procedures  for  dealing 
with  odd  sample  sizes  are  discussed  in  [1]  and  [2] . 


The  above  statistic  is  generalized  by  further  subdivi¬ 


ding  the  xy  plane  by  additional  equally  spaced  order  statis- 

2 

tics.  The  xy  plane  is  divided  into  n  regions  by  n-1  x  order 
statistics  and  n-1  y  order  statistics. 


denotes  the  i***1 


x  order  statistic  from  GBC  of  order  n  and  r|fn^  denotes  the  i^1 
y  order  statistic  from  GBC  of  order  n.  Thus 

TT^n)  =  Prob{  Mi^[<x<s|n))  and  (n£;[<y<n  fn) ) } 


7Tdn)  "  Prob{  A|C^^<x<C;[n))  and  { <y < nn-i+l >  > 

where  an<^  are  taken  to  be  -®  and  and  r/n^  are 

0  0  ^n  n 

taken  to  be  +®. 

Let  denote  the  region  of  the  xy  plane  where 


(4) 


<  x  <  5<nl 


Let  |r£j’ 


and  n!n!  <  y  <  n'n) 
3-1  3 

(n) 


denote  the  number  of  samples  in  r^ 


The  sam¬ 


ple  statistic  3^)  is  computed  by 


Q.  *  i(  z  |r<n>|  -  z  I r (n)  I)  (5) 

q(n)  N  * i^1  *i j  1  ii1|ri,n-i+ll)  (5) 


where  N  is  the  number  of  samples  and  n  is  the  number  of  sub- 
divisions  along  the  x  or  y  axis,  which  results  in  n  regions 
in  the  xy  plane. 

An  alternative  method  for  computing  the  sample  analog  is 


presented  in  the  following  section.  This  will  allow  us  to 
investigate  the  asymptotic  properties  of  the  statistics  inclu¬ 
ding  the  asymptotic  relative  efficiency. 


3.  The  U  statistic  £or  computing  -^(n) 

In  equation  (5)  we  computed  the  sample  correlation  coeffi¬ 
cient  by  counting  the  number  of  samples  that  lie  in  the  re¬ 
gions  on  the  diagonals  of  the  xy  plane;  see  Figure  1.  Those 
points  in  regions  on  diagonal  1  are  counted  in  the  first  sum 
of  equation  (5) ,  whereas  those  points  in  regions  along  dia¬ 
gonal  2  are  counted  in  the  second  sum.  Points  along  diagonal 
1  are  seen  as  contributing  to  positive  correlation,  and  points 
along  diagonal  2  are  seen  as  contributing  to  negative  correla¬ 
tion. 


We  can  reformulate  the  computation  of  q'^j  by  viewing  this 
statistic  as  a  sum  of  functions  of  the  sample  points,  $(x^,y^). 

We  let  <)>(•)  be  1  if  the  point  (x^,y^)  is  in  one  of  the  regions 
along  diagonal  1,  <J>  ( • )  is  -1  if  (x^,y^)  is  in  a  region  along 
diagonal  2,  and  otherwise  is  zero.  This  reformulated  sta¬ 

tistic  will  be  referred  to  as  U  which  is  given  by 

1  N 

u  *  s  (6> 

where  N  is  the  number  of  sample  points ,  and  $ ( • )  is  described 
above.  We  now  show  how  to  compute  <Mxi,y^) . 

Recall  that  we  have  subdivided  each  axis  (x  or  y)  into 
n  intervals.  A  point  Lx.,y^)  is  said  to  be  in  x(y)  interval 
k  if  )  *  Clearly  k  varies  from  1 

to  n.  Thus  a  point  is  in  one  of  the  regions  along  diagonal  1 
if  its  x  interval  number  equals  its  y  interval  number.  Also 
a  point  is  in  one  of  the  regions  along  diagonal  2  if  its  x  inter¬ 
val  number  equals  n+l-(y  interval  number). 


Clearly,  N/n  is  the  number  of  points  in  each  interval 


along  either  axis.  Thus  the  interval  number  of  is  1  if 
the  rank  of  x^  (denoted  R^)  is  between  1  and  N/n.  We  compute 
the  x  interval  number  by 

x  interval  number  * 

Equivalently  the  y  interval  number  is  computed  by  replacing 
with  Q^,  the  rank  of  y^.  We  can  now  express  the  condition 
that  the  x  interval  number  equals  the  y  interval  number  by 

\  ]  .  K 

WnT|  1 17H) 


4-1 

(n7HT|  * 


where  fxl  is  the  ceiling  function,  the  least  integer  >x. 
^xi'yi^  defined  as 


<P  (xi,yi)  = 


1 

if 

fRi  I 

(N/n) 

otherwise 

u 

if 

(N/n)  | 

(9) 


n+1“  (N/n) ] 


We  are  careful  to  note  that  $  has  one  sample  as  an  argument. 
We  denote  by  the  single  sample  point  (x^,y^)  and  we  write 


N 

E 

i=l 


(10) 


for  $(•)  defined  above.  By  showing  that  q(n)  is  estimable 
of  degree  1,  the  statistic  as  given  in  equation  (9)  is  a 
U  statistic.  Recall  that  this  statistic  is  being  used  to  test 
the  hypotheses 


Hq  :  F(x,y)  =  FCx)F(y) 

F(x,y)  ^  F  (x)  F  (y) 

or  some  subclass  of  H^.  We  now  show 
Theorem:  q(n)  is  estimable  of  degree  1. 

Proof:  To  show  that  q  (n)  is  estimable  of  degree  1,  we  show 
that  there  exists  a  function  $  of  one  argument  such  that 

EUCZjH  *  q'(n) 

We  let  <{>(Z^)  be  the  function  defined  in  equation  (9).  Under 
the  null  hypothesis  of  independence,  all  regions  of  the  x-y 
plane  are  equally  likely;  thus 

E{ <J)  (Z±)  }  =  0 

which  is  precisely  q’nj  under  the  null  hypothesis.  Hence 
q'jnj  is  estimable  of  degree  1.  |j 

From  Hoeffding  [3],  we  know  that  q|n^  is  asymptotically 
normally  distributed  with  mean  given  by 


W 

rs 

♦0 

II 

O 

(id 

and  the  variance 

is 

Var{q(n))  = 

(12) 

where  is 

«1  -  «♦*<*!> 1  -  <q(n)>2 

(13) 

To  determine 

the  variance  of  the  statistic  q|nj/  we 

first 

compute 

H  *  E‘*2<Zi>>  -  <q;n)>2 

2n 

"  “T  -  0 

n 

=  2 

(14) 

1  U 

n 

. -  r  '  ~ 

is  0.  The  variance 


2 

since  <p  (Z^)  is  0  or  1,  and  as  before 
is 

varl<J(n)>  ' 

Nn  (15) 

Appendix  1  contains  tables  of  critical  values  of  <3(n)  for 
n=2  to  8  and  for  sample  sizes  from  N=2  to  30.  The  signifi¬ 
cance  levels  indicated  are  one-tailed;  thus  for  two-tailed 
tests  they  should  be  doubled.  These  tables  will  be  used  in 
the  examples  to  follow.  First,  we  present  some  asymptotic 
results  concerning  q’^. 


4.  Asymptotic  properties  of  q*flj 

The  asymptotic  relative  ef f it iency  (ARE)  of  two  statis¬ 
tical  tests  is  a  ratio  of  the  sample  sizes  required  to 
achieve  the  same  level  of  statistical  significance.  The 
sample  size  in  using  the  first  test  need  only  be  (100*ARE)% 
of  the  sample  size  of  the  second  test  to  achieve  the  same  sta¬ 
tistical  significance.  We  investigate  the  ARE  of  q'^ 
(Generalized  Blomqvist  Correlation)  relative  to  o'  (medial 
axis  correlation).  The  reader  is  referred  to  Gibbons  [4] 
for  a  detailed  explanation  with  examples.  The  ARE  is  defined 
as 


ARE(q’(n),q’) 


.  lim 

VH0 


e(q'n') 

e(q') 


(16) 


where  the  efficacy  e ( • )  is 


.(«>  -  jynVfloi- 

0  <Tll0=eo 


(17) 


for  a  test  statistic  T.  For  both  q'^  and  3*  dE(T)/d0  is  1 , 
so  tnat 


°  lq  (n)  ' 


(18) 


The  variance  of  q|nj  is  given  in  equation  (15) ,  and  from 
Blomqvist  (2)  the  variance  of  q'  is 

I4a0(l-2a0)]/k 

where  ag  is  Prob{x<xm  and  y<ym>  in  the  neighborhood  of  . 

The  ARE  is 


'  U 


'C 


>  — 


lim 


ARE (q |n) *3 1 )  =  ^ 


VH0 


lim 

N-*-00 

H1^H0 


[4aQ(l-2a0)]/k 

2/Nn 


4aQ (l-2aQ)  .  ^ 


(19) 


Since  N=2k  (see  Blomqvist  [2] ) 


ARE(q'(n),q')  -  <4a„  (l-2«0)  -n 

Hi*H0 


Now  taking  the  limit,  as  H^Hq,  aQ=j  and 


(20) 


ARE(q'(n)  ,q'  )  *  |  (21) 

2 

The  sample  size  for  q’n^  need  only  be  (— )  times  the  sample 
size  of  q'  to  achieve  the  same  level  of  significance.  We  can 
check  this  by  recalling  that  for  n=2 ,  GSC  is  exactly  medial 
axis  correlation  and  we  would  expect  the  ARE  to  be  1,  which 
it  is.  In  the  limit,  there  would  be  one  sample  in  each  inter¬ 
val  when  n=N;  thus 

ARE(q'(N)  ,q'  )  =  N/2  (22) 


Examples  using  Generalized  Blomgvist  Correlation 


In  this  section,  we  will  present  several  examples  of 
using  GBC  in  practice.  We  will  compare  GEC  of  order  4  with 
Spearman's  rho,  Kendall's  tau,  and  the  medial  axis  correla¬ 
tion.  Sample  sizes  of  20  and  30  will  be  used  from  bivariate 
normal  and  exponential  distributions  with  known  correlation. 

The  first  example  is  Hajek's  data  [5].  This  sample  is 
of  size  20  with  known  correlation  of  p=j.  Kajek  showed  that 
with  both  Spearman's  test  and  Kendall's  test  we  could  reject 
the  hypothesis  of  independence  at  a=.10,  but  we  were  forced 
to  accept  the  null  hypothesis  at  a=.05.  Kajek 's  results  for 
the  quadrant  test  which  is  precisely  GBC  of  order  2  indicate 
that  the  null  hypothesis  cannot  be  rejected  at  a=.10.  The 
same  is  true  of  GEC  of  order  4.  The  results  of  this  and  the 
following  examples  are  presented  in  Figure  2.  In  all  cases, 
we  are  testing  the  null  hypothesis  of  independence  versus  the 
alternate  of  positive  dependence. 

In  Figure  2 ,  we  present  results  for  two  more  bivariate 
normal  distributions  and  three  bivariate  exponential  distri¬ 
butions  whose  form  is  from  Mardia  [6] .  The  real  correlation 
coefficient  is  indicated,  along  with  those  computed  using 
Spearman's,  Kendall's,  the  medial  axis  test,  and  GBC  of  order 
4.  GBC (2)  is  the  medial  axis  test  which  is  identically  GBC 
of  order  2,  and  GBC (4)  is  GBC  of  order  4.  The  associated 


•  l\ 


v 


> 


t  s' 


significance  levels  are  indicated  below  the  correlation  coeffi¬ 
cient#  where  ns  means  not  significant  at  ct*.10.  The  first 
entry  of  the  three  normal  distribution  examples  is  Hajek's 
data.  Since  sample  sizes  were  varied,  the  sample  size  is 
indicated  along  with  the  actual  correlation. 


’  n 


6 .  Conclusion 


In  this  paper  a  generalization  of  the  medial  axis  correla¬ 
tion  test  was  presented.  By  Hoeffding's  results  on  U-statis- 
tics,  we  were  able  to  determine  the  asymptotic  distribution 
of  GBC.  The  asymptotic  relative  efficiency  of  GBC  to  medial 
axis  correlation  was  given  and  shown  to  be  the  ratio  of  the 
number  of  intervals  used  in  GBC  to  the  number  of  intervals 
used  in  the  medial  axis  correlation,  which  is  two.  Thus  one 


can  see  that  the  ARE  of  q^n  ^  to  q'^n  ^  is  simply  nj/n2* 

1  2 

The  critical  levels  of  q*  .  for  n  =  2  to  8  were  tabulated 
and  they  are  used  in  several  examples  presented  in  this  paper. 
The  null  hypothesis  is  that  of  independence  versus  an  alter¬ 
native  of  positive  dependence.  The  first  three  examples  are 


from  a  bivariate  normal  distribution  with  known  correlation. 


The  results  indicate  that  any  order  of  GBC,  invovling  the  medial 
axis  test,  is  inferior  to  Spearman's  rho  and  Kendall's  tau. 

This  is  expected  since  it  is  well  known  that  the  medial  axis 
is  less  efficient  than  either  Spearman's  or  Kendall's  test 
relative  to  the  normal  alternatives. 


The  final  three  examples  were  taken  from  bivariate  expo¬ 
nential  distributions  with  known  correlation.  In  these  three 


cases,  both  GBC (2)  and  GBC (4)  performed  better  than  either 
Spearman's  or  Kendall's  test.  In  all  three  examples,  both 
GBC (2)  and  GBC (4)  achieved  a  higher  level  of  significance. 
The  second  example  with  p**.5  and  N*30  shows  the  real  power 


of  GBC.  We  were  able  to  achieve  a  higher  level  of  significance 
by  increasing  the  number  of  order  statistics  used  to  partition 
the  xy  plane.  This  behavior  is  intuitively  possible  as  one 
can  see  upon  closer  examination  of  a  single  quadrant  of  the 
xy  plane  in  the  medial  axis  test. 

If  we  partition  the  xy  plane  for  GBC (4) ,  the  above  chosen 
quadrant  is  itself  divided  into  four  quadrants.  Two  of 
these  are  "on"  diagonal  (upper  left,  lower  right)  and  the 
other  two  are  "off"  diagonal  (lower  left,  upper  right) .  It 
is  natural  to  expect  that  if  the  two  random  variables  are 
positively  correlated,  the  points  that  lie  in  the  chosen  qua¬ 
drant  will  actually  lie  in  the  "on"  diagonal  regions  when  the 
quadrant  is  further  partitioned.  This  is  true  for  the  upper 
left  and  lower  right  quadrants.  For  the  lower  left  and  upper 
right  quadrants,  if  the  two  random  variables  are  negatively 
correlated,  we  expect  the  points  to  lie  in  the  "off"  diagonal 
regions  when  partitioned.  Figure  3  illustrates  this  concept. 

The  purpose  of  developing  Generalized  Blomqvist  Correlation 
was  to  investigate  whether  or  not  the  additional  information 
provided  by  additional  order  statistics  will  give  a  test  with 
a  higher  level  of  statistical  significance.  The  results  pre¬ 
sented  in  this  paper  illustrate  that  this  is  indeed  possible. 
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Figure  2 •  Experimental  results  of  correlation  tests 
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